The main functions to interact with the $\chi$-distribution are chi2.rvs(), chi2.pdf(), chi2.cdf(), chi2.ppf() from the scipy.stats package. The chi2.pdf() function gives the density, the chi2.cdf() function gives the distribution function, the chi2.ppf() function gives the quantile function, which is the inverse of cdf - percentiles, and the chi2.rvs() function generates random deviates.

We use the chi2.pdf(x, df, loc=0, scale=1) to calculate the density for the integer values 4 to 8 of a $\chi^2$-curve with $df=7$.

In [2]:
# First, let's import all the needed libraries.
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
In [3]:
x = np.arange(4, 8.01, 1)
stats.chi2.pdf(x, df=7)
Out[3]:
array([0.11518073, 0.12204152, 0.11676522, 0.10411977, 0.08817914])

We use the chi2.cdf() to calculate the area under the curve for the interval $[0,6]$ and the interval $[6, \infty)$ of a $\chi^2$-curve with $df=7$. Further, we ask Python, if the sum of the intervals $[0,6]$ and $[6, \infty)$ sums up to 1:

In [4]:
# interval $[0,6]
stats.chi2.cdf(6, df=7)
Out[4]:
0.4602506496044429
In [5]:
# interval $[6,inf]
1 - stats.chi2.cdf(6, df=7)
Out[5]:
0.539749350395557
In [6]:
(1 - stats.chi2.cdf(6, df=7)) + stats.chi2.cdf(6, df=7) == 1
Out[6]:
True

We use the chi2.ppf() to calculate the quantile for a given area (= probability) under the curve for a $\chi^2$-curve with $df=7$ that corresponds to $q = 0.25, 0.5, 0.75$ and $0.999$. We set 1 - chi2.ppf() in order the get the area for the interval $[0, q]$.

In [7]:
stats.chi2.ppf(0.25, 7)
Out[7]:
4.2548521835465145
In [8]:
stats.chi2.ppf(0.5, 7)
Out[8]:
6.345811195521515
In [9]:
stats.chi2.ppf(0.75, 7)
Out[9]:
9.037147547908143
In [10]:
stats.chi2.ppf(0.999, 7)
Out[10]:
24.321886347856854

We use the chi2.rvs(df, loc=0, scale=1, size=1) function to generate 100,000 random values (size) from the $\chi^2$-distribution with $df=7$. Thereafter we plot a histogram and compare it to the probability density function of the $\chi^2$-distribution with $df=7$ (orange line).

In [11]:
rand_chi2_samples = stats.chi2.rvs(df=7, size=100000)

plt.figure(figsize=(10, 5))
plt.hist(
    rand_chi2_samples,
    density=True,
    color="lightgrey",
    edgecolor="darkgrey",
    bins="scott",
)

plt.title("Histogram for $\\chi^2$-distributions with 7 degrees of freedom (df)")

plt.plot(
    np.arange(0, 20, 0.1),
    stats.chi2.pdf(np.arange(0, 20, 0.1), df=7),
    "-",
    linewidth=2,
    color="orange",
)
plt.xlabel("samples")
plt.ylabel("Density")

plt.xlim(0, 16)
plt.show()

Citation

The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.

Creative Commons License
You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.